Optimal Tuning of Continual Online Exploration in Reinforcement Learning
نویسندگان
چکیده
This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, “exploitation” is maximized for constant “exploration”. This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full “blind” exploration is performed.
منابع مشابه
Tuning Continual Exploration in Reinforcement Learning
This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action. Then, the exploration/exploitation tradeoff is formulated as a global optimization...
متن کاملTuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy
This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action in that state. Then, the exploration/exploitation tradeoff is formulated as a globa...
متن کاملReinforcement Learning Based PID Control of Wind Energy Conversion Systems
In this paper an adaptive PID controller for Wind Energy Conversion Systems (WECS) has been developed. Theadaptation technique applied to this controller is based on Reinforcement Learning (RL) theory. Nonlinearcharacteristics of wind variations as plant input, wind turbine structure and generator operational behaviordemand for high quality adaptive controller to ensure both robust stability an...
متن کاملLearning exploration strategies in model-based reinforcement learning
Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called leo for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select explor...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کامل